Automated Extraction of Swedish Neologisms using a Temporally

نویسنده

  • Viggo Kann
چکیده

This thesis presents an automated system for extracting neologisms using machine learning approaches. The neologisms are extracted from a large temporally annotated corpus containing newspaper articles and blog posts. We find that our system is different from much of the previous research on neologism extraction and justify these differences by relating it to current research in evolutionary linguistics. Our main contribution is a system which can incorporate a larger amount of features than any previous system. Our approach also enables multiple annotators to provide feedback to the system and affect the way the system ranks words in regard to “newness”. In addition, our system is capable of assisting users manually extracting neologisms by providing statistics regarding word usage over time. We also present and analyze words annotated by twelve anonymous annotators in regard to their “newness” and are able to draw conclusions on how language users perceive neologisms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Technical Term Extraction Using Measures of Neology

This study aims to show that frequency of occurrence over time for technical terms and keyphrases differs from general language terms in the sense that technical terms and keyphrases show a strong tendency to be recent coinage, and that this difference can be exploited for the automatic identification and extraction of technical terms and keyphrases. To this end, we propose two features extract...

متن کامل

Identification of Neologisms in Japanese by Corpus Analysis

In Japanese and other languages that do not use spaces or other markers between words, the identification and extraction of neologisms and other unrecorded words presents some particular challenges. In this paper we discuss the problems encountered with neologism identification and describe and discuss some of the methods that have been employed to overcome these problems.

متن کامل

Linking SweFN++ with Medical Resources, towards a MedFrameNet for Swedish

In this pilot study we define and apply a methodology for building an event extraction system for the Swedish scientific medical and clinical language. Our aim is to find and describe linguistic expressions which refer to medical events, such as events related to diseases, symptoms and drug effects. In order to achieve this goal we have initiated actions that aim to extend and refine parts of t...

متن کامل

Experiments in investigating sound symbolism and onomatopoeia

The area of sound symbolism and onomatopoeia is an interesting area for studying the production and interpretation of neologisms in language. One question is whether neologisms are created haphazardly or governed by rules. Another question is how this can be studied. Of the approximately 60 000 words in the Swedish lexicon 1 500 have been judged to be sound symbolic (Abelin 1999). These were an...

متن کامل

Identification of selected monogeneans using image processing, artificial neural network and K-nearest neighbor

Abstract Over the last two decades, improvements in developing computational tools made significant contributions to the classification of biological specimens` images to their correspondence species. These days, identification of biological species is much easier for taxonomist and even non-taxonomists due to the development of automated computer techniques and systems.  In this study, we d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010